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Abstract 

We present a simple algorithm that generates a De Bruijn sequence of the set of 
primitive words of any given length over any alphabet. We also show that the shortest 
sequence that contains all squares of length 2n over an alphabet of size k has length 
between 2k n and (2 + ±)k n . 

1 Introduction 



Given an integer k > 2, we define := {0, 1, . . . , k — 1}. A De Bruijn sequence of E£ is a 
circular sequence in which every word in E£ appears as a factor exactly once. For example, 
00011101 is a De Bruijn sequence of {0, l} 3 . It has been long known that such a sequence 
exists for E£, for any k,n. In fact, there are exponentially many such sequences pQ. 

Moreno [2] extended the notion of De Bruijn sequences to any dictionary T> C and 
defined a De Bruijn sequence of P to be a circular sequence in which every word in D (an no 
other n-tuple) appears exactly once. Moreno also characterized of the dictionaries on which 
De Bruijn sequences exist by looking at their corresponding De Bruijn graphs. 

Given a dictionary T>, it is natural to ask the following questions: is there a De Bruijn 
sequence on Z>? If so, can it be efficiently generated? If not, how "far" is T> away from 
having one? 

We try to settle the above questions on two of the most studied set of words: the primitive 
words and the squares. In Section 2, we provide a simple algorithm that generates a De Bruijn 
sequence of the primitive words in ££, for any n, k. In Section 3, we show that the shortest 
sequence that contains all squares in E^ 1 as subwords has length between 2k n and (2 + \)k n . 
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2 Primitive words 



A word w is primitive if there does not exist a word x and an integer p > 2 such that w = x p . 
Throughout this section, we let V denote the set of primitive words in for some fixed k 
and n. 

It is well known that, to generate a De Bruijn sequence on EjJ, we can just do the 
following: write down n zeros, then successively write down the largest number that does 
not create a subword of length n that had appeared earlier in our sequence, and stop if there 
is no such number. Somewhat surprisingly, this simple algorithm can be easily adopted to 
generate a De Bruijn sequence on V: 

Proposition 1. The following algorithm generates a word of length \V\+n — l that contains, 
as subwords, all primitive words in 

(1) Output a"' 1 ; 

(2) Let x be the previous n—1 symbols output. Choose the largest i from {0, . . . , k — 1} such 
that xi is primitive, and has not yet appeared in the word that was output. Terminate 
when there is no choice for i. 

For example, applying the above algorithm on the set of primitive words in Sj, we obtain 
the sequence 000111011001000. 

We present a series of small results that lead to showing that our algorithm is correct. 

Lemma 2. For every u e S^" 1 , a £ au is primitive if and only if ua is primitive. 

Proof. Suppose au is not primitive, and can be written as x p for some word x and integer 
p > 2. Then the first symbol of x has to be a. If we write x as ay, it is easy to see that 
(ya) p = ua. Hence ua is not primitive, and our claim follows. □ 

Next, we need some notations. For any finite word w, we let \w\ denote its length, and 
w[i] to be the i-th symbol in w. Also, given 1 < % < j < n, we define to be the 

subword u;[i]w[i + 1] . . . w[j — l]w[j] of w. 

Given u a primitive word in ££, we construct the word /„ by the following procedure. 
For any % < n, we let f u [i] = u[i]. For % > n + 1, if f u [i — n + l..i — 1] = ra_1 , we terminate. 
Otherwise, we define f u [i] to be the smallest letter such that f u [i — n + l..i] is primitive. The 
following lemma assures that /„ is well-defined: 

Lemma 3. Given any u e S^ -1 , if uO is not primitive, then ul is primitive. 
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Proof. Suppose for a contradiction that both uO and ul are not primitive, so uO = x p and 
ul = y q for words x, y and integers p, q > 2. Then we know that -u[s|x|] = for every 
s G {1, . . .p — 1} and u[t|y|] = 1 for every t G {1, . . . , q — 1}. 

Define m := gcd {|x|, Since ^ and |y| are coprime, there exists s < \y\ — 1 such that 
= 1 mod |y|, which implies that s\x\ = m mod Since = 0, this implies that 

y[m] = 0, and so u[m] = 0. Conversely, we can find t < \x\ — 1 such that t— = 1 mod 
Then we have t\y\ = m mod and u[m] = x[m] = u[t\y\] = 1, a contradiction. □ 

Therefore, we conclude that f u [i] is well defined for all i. Furthermore, if i > n + 1, we 
know that f u [i] G {0, 1}. Next, we prove a lemma that is key to our main result in this 
section. 

Lemma 4. For any u eD, f u is finite. 

Proof. Suppose n_1 never appears in f u , and the construction never terminates. First, 
observe that the number of zeros in f u [i..i + n] is no less than that in f u [i + + n+l] for 
every i > 1. This is because if / u [i] = 0, and f u [i..i + n] is primitive, then f u [i + l..i + n]0 is 
primitive by Proposition [21 

Since f u [i] G {0, 1} Vi > n + 1, we see that / u has to be ultimately periodic, and there 
exists v G {0, l} n such that d is primitive, but is not primitive if we replace any 1 in v by a 
0. Moreover, v has at least two l's (otherwise we have n_1 in f u ). We let v' to be the word 
obtained from v by replacing the last 1 in v by 0, and v" be the word obtained from v by 
replacing the second last 1 in v by 0. 

Suppose v' = x p where x is primitive and p > 2. Since x cannot be all zero's, we 
see that there must be at least two l's in v[n — \x\ + l..n]. Therefore, we can write v" 
as x p_1 a;', where x, x' have Hamming distance two. Also, since v" is not primitive, there 
exists i G {1, . . . ,n — 1} such that v"[i + l..n]v"[l.i] = v". If we write i = l\x\ + j where 
< j < \x\, we have x' = v"[n — \x\ + l..n] = x[j + Hence, if we let a := 

x[l..j],b := x[j + l..|x|], then x = ab,x' = ba, and v" = (a6) p_1 6a = b(ab) p ~ l ~ 2 ba(ab) l a. 

We next derive a contradiction by showing that ab = ba. If I > 0, then (ab) p ~ 1 ba ends 
with bba and b(ab) p ~ l ~ 2 ba(ab) 1 a ends with aba, and we're done. Similarly, if p — I — 2 > 0, 
then (ab) p ~ 1 ba starts with ab and b(ab) p ~ l ~ 2 ba(ab) 1 a starts with ba, and we can deduce that 
ab = ba. Otherwise, I = p — I — 2 = and we have a66a = bbaa, or equivalently, a66 = bba. 

Suppose there are strings a, b such that abb = bba but ab ^ ba. Furthermore, pick a, b such 
that \a\ is minimized among pairs of strings with this property. If |6| > \a\, then abb = bba 
implies that a is a prefix of b, and we may write b = ac. Then aacac = abb = bba = acaca, 
and hence ac = ca and ab = aac = aca = ba. Otherwise, \a\ > \b\, b is a prefix of a, and we 
write a = be. Then we have bebb = abb = bba = bbbc, hence ebb = bbc. Since |c| < \a\, we 
conclude that be = cb and so ab = beb = bbc = ba. 
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Therefore, x = ab is not primitive, contradicting our choice of x. We conclude that the 
construction of /„ must terminate finitely. □ 



We are now ready to verify the correctness of our algorithm: 

Proof of Proposition Ql Let w be the word output by our algorithm. Notice that every block 
of w of length n is primitive, and no block repeats. Therefore, it suffices to show that every 
primitive word of length n appears as a subword in w. 

Notice that w must end with n_1 . Otherwise, let x := w[\w\ —n + 2, \ w\], and suppose x 
appeared p times in w. The algorithm terminates at x implies that | {i : xi G T>} \ — p — 1. 
However, since w does not start with x, we have | {i : ix G X>} | > p, contradicting Lemma [2J 

Suppose w does not contain all primitive words. Let jy be such a word, where j G 
Efe, y G Since | {i : iy G £>} | = | {i : yi G £>} | and | {i : iy is a subword of to} | = 

| {i : is a subword of w} \ , there exists ji G such that yji is primitive and not a subword 
of u>. In particular, since our algorithm always chooses the largest possible symbol to extend 
our sequence, we may assume that ji is the smallest number such that yji is primitive. 

Applying the same reasoning on y[2..n — l]ji, we conclude that if we let j'2 be the smallest 
symbol such that y[2..n — l]jij2 is primitive, then y[2..n — l]jij2 does not appear in w. 

Keep proceeding, and we conclude that any subword of length n in fj y does not appear 
in w. Since fj y ends with n_1 , this implies that w misses some primitive word that starts 
with n_1 . However, if that was the case, the algorithm would not have terminated, and 
hence this is a contradiction. □ 

Thus, we can obtain a De Bruijn sequence by deleting the last n — 1 bits of the word 
generated by the above algorithm (which would be all zeros), and view it as a circular 
sequence. 



3 Squares 

A word w is a square if w = xx for some word x. It is easy to see that, unlike the primitive 
words, there are no De Bruijn sequence for the squares, since (PI) fails. In fact, if we let T> 
to be the set of squares in E^ 1 , then the De Bruijn graph of T> has as many components as 
the number of conjugate classes in ££. 

The next question to ask is then, what is the length of the shortest sequence that contains 
all the squares as factors? 
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We show in this section that we can include all k n squares of length 2n in a sequence 
that has length slightly more than 2k n . First, we prove that we cannot do much better than 
2k n . 

Define an equivalence relation on ££, where u ~ v if they are conjugates of each 
other. Let C(n,k) denote the number of conjugate classes in ££. Observe that C(n,k) = 
Yld>i-d\n ^r^5 where (p(d) is the Euler's Phi function - the number of integers between 1 
and d that are coprime with d. Notice that C(n, k) > ^- for all n, k. 

Then we have the following: 

Proposition 5. Suppose w is a word over that contains every square in S| n as factors. 
Then \w\ > k n + nC(n, k) > 2k n . 



Proof. Let (x^ , , . . . x^) be the ordered list, such that x^ , , . . . , x^ appear in 
w in that order. Observe that if 7^ x^ 4 " 1 -* , then and x^ l+1 ^ can overlap at most n — 1 
bits in w. Therefore, every time x^\ x^ +1 ^ lie in different conjugate classes, there are at least 
n blocks of length 2n in w between x^ and that are not squares. Since there are at 

C(n, k) conjugate classes in E£, we see that there are at least n(C(n, k) — 1) blocks of length 
2n in w that are not squares. 

Therefore, there are at least k n + n(C(n, k) — 1)) blocks of length 2n in w, and so 

\w\ >k n + n(C(n, k) - 1) + 2n - 1 > k n + nC(n, k) > 2k n , 

and our claim follows. □ 



We next show that there is a word w of length (2 + \)k n on E fc that contains all squares 
of length 2n. 

Let s be a De Bruijn sequence on SjJ -1 . We repeat the first n — 2 bits of s at the end 
and look at it as an ordinary (i.e. not circular) word, so s has length (k n ~ l + n — 2). Also, 
given u G ££, define S(u) := min {p > 1 : u[p + l..n]u[l..p] = u}. Note that 5(u) = n if and 
only u is primitive. 

We construct w by the following algorithm: 



1. Step j, l<j< k n - u . 

(a) For i e {0, . . . , k — 1}, if (s[j,j + I — 2}i) 2 has not yet appeared in w, we accept i, 
and append (s[j,j + / — 2]i) 1+< - 5 ^ j ' j+l ~ 2 ^ l ^ n ^ to the end of w. Otherwise, we reject 
i and append nothing. 

(b) Append s[j}. 



5 



2. Step k n ~ x + 1: Append s^ 1 + 1, k n ~ l +n-2}. 



For example, when k = 2, n = 3, let s = 00110 be our De Bruijn word on E 2 ,. Then the 
algorithm runs as follows: 

Step 1: s[l,2] = (00). algorithm accepts both and 1. Also, 5(000) = 1 and 5(001) = 3. Thus, 
the algorithm adds (000) 4 / 3 (001) 2 to w; 

Step 2: s[2,3] = (01), algorithm rejects (because (010) 2 has already appeared) but accepts 1 
(because (Oil) 2 has not yet appeared), adds (011) 2 to w; 

Step 3: s[3,4] = (11), algorithm rejects 0, accepts 1, adds (111)31 to w; 

Step 4: s[4,5] = (10), algorithm rejects both and 1, adds 1 to w; 

Step 5: algorithm adds s[5] = to w; 

Output: w = 0000001001001101101111110. 

For convenience, given v G ££, we let denote v\p + l..n]v[l..p\. Then we have the 
following: 

Proposition 6. Suppose w is the word constructed by the above algorithm. Then we have 
the following: 

(i) For every word v G v 2 appears exactly once in w; 
(ii) w has length k n + C(n, k) + k'^ 1 + n — 2. 

Proof. We prove (i) by showing that for every v G ££, there exists a unique p G {0, 1, — 1} 
such that (t;(p))2+(5(^ p) )-i/™) is a f actor of w. Note that ( v (p))2+(5(« (p) )-i/«) contains the square 
of each conjugate of v exactly once. 

We pick the smallest j such that s[j,j + n — 2] is a prefix of some conjugate of v, say 
v (p) _ Then we know that at step j, w does not contain (t/ p )) 2 and would accept i 
so {v^fHK'^/n) ig appenc i ec i to w _ 

If at step j, some index bigger than % is accepted, then we know the block s[j, j + n — 2] = 
v^[l,n — 1] immediately follows, giving us the desired power of v^. Otherwise, we know 
that s[j] gets added to w at the end of step j. 

Then, if any index is accepted in step j + 1, then s[j + 1, j + n — 1] is added to w, and 
we get our desired power of v^ p \ Otherwise, we just add s[j + 1] at the end of step j + 1. 
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Proceeding in this manner, we see that the algorithm always add s[j, j + n — 2] immediately 
after adding (t/p^+W" p )/«■) at step j. If j < k n — I + 1, then this happens by step k n . 
Otherwise, this is taken care of at the very last step, which adds the last n — 1 bits of s to 
w. 

For (ii), we know from (i) that there are exactly k n blocks in w that are squares. Also, 
it is not hard to see that throughout the algorithm, the number of indices accepted is 
exactly C(n, k). Furthermore, every time an index is accepted in part (a) of a step, exactly 
n non-square blocks of 2n are created. The (b) steps and the final step together adds 
\s\ = k n ~ l + n-2 bits to w. Therefore, w has length k n + nC(n, k) + k n ~ x +n-2. □ 

We note that the algorithm can possibly be improved by observing that we actually do 
not need s to be a De Bruijn sequence on Let S(n, k) denote the size of the smallest 

subset S C E^p 1 such that 

1. Wu G ££, there exists v G S that v is a prefix of some conjugate of u; 

2. there is a De Bruijn sequence on S. 

Then the algorithm is still correct if we replace s by this shorter De Bruijn word. Doing 
so would replace the k n ~ l factor in the length of w by S(n, k). Unfortunately, we are unable 
to provide any non-trivial upper bound for S(n, k). 
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